Topics
Contents Digital Image Processing, 35 Article(s)
Few-Shot Object Detection Based on Association and Discrimination
Jianli Jia, Huiyan Han, Liqun Kuang, Fangzheng Han, Xinyi Zheng, and Xiuquan Zhang

Deep learning-based object detection algorithms have matured considerably. However, detecting novel classes based on a limited number of samples remains challenging as deep learning can easily lead to feature space degradation under few-shot conditions. Most of the existing methods employ a holistic fine-tuning paradigm to pretrain on base classes with abundant samples and subsequently construct feature spaces for the novel classes. However, the novel class implicitly constructs a feature space based on multiple base classes, and its structure is relatively dispersed, thereby leading to poor separability between the base class and the novel class. This study proposes the method of associating a novel class with a similar base class and then discriminating each class for few-shot object detection. By introducing dynamic region of interest headers, the model improves the utilization of training samples and explicitly constructs a feature space for new classes based on the semantic similarity between the two. Furthermore, by decoupling the classification branches of the base and new classes, integrating channel attention modules, and implementing boundary loss functions, we substantially improve the separability between the classes. Experimental results on the standard PASCAL VOC dataset reveal that our method surpasses the nAP50 mean scores of TFA, MPSR, and DiGeo by 10.2, 5.4, and 7.8, respectively.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837015 (2024)
Real-Time Pedestrian Detection Based on Dual-Modal Relevant Image Fusion
Chengcheng Bi, Miaohua Huang, Ruoying Liu, and Liangzi Wang

In order to solve the problems of high missing detection rate of single-model images and low detection speed of existing dual-model image fusion in pedestrian detection tasks under low visibility scenes, a lightweight pedestrian detection network based on dual-model relevant image fusion is proposed. The network model is designed based on YOLOv7-Tiny, and the backbone network is embedded with RAMFusion, which is used to extract and aggregate dual-model image complementary features. The 1×1 convolution of feature extraction is replaced by coordinate convolution with spatial awareness. Soft-NMS is introduced to improve the pedestrian omission in the cluster. The attention mechanism module is embedded to improve the accuracy of model detection. The ablation experiments in public infrared and visible pedestrian dataset LLVIP show that compared with other fusion methods, the missing detection rate of pedestrians is reduced and the detection speed of the proposed method is significantly increased. Compared with YOLOv7-Tiny, the detection accuracy of the improved model is increased by 2.4%, and the detection frames per second is up to 124 frame/s, which can meet the requirements of real-time pedestrian detection in low-visibility scenes.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837014 (2024)
Regular Building Outline Extraction Based on Multi-Level Minimum Bounding Rectangle
Gang Li, Ke Liu, Hongchao Ma, Liang Zhang, and Jialin Yuan

Building outlines serve as data sources for various applications. However, accurately extracting outlines from scattered and irregular point clouds presents a challenge. To address this issue, a method utilizing the concept of the multi-level minimum bounding rectangle (MBR) is proposed for extracting precise outlines of regular buildings. Initially, the boundary points are segmented into groups using an iterative region growing technique. Subsequently, the group with the maximum boundary points is utilized to identify the initial MBR. The initial MBR is then decomposed into multi-level rectangles, ensuring that the boundary points align with rectangles of different levels. Ultimately, the outlines are generated using the multi-level MBR approach. To evaluate the effectiveness of the proposed method, experiments were conducted on regular buildings in Vaihingen. The results demonstrate that the proposed method achieves an accurate initial MBR with a slightly enhanced efficiency compared to the minimum area and the maximum overlapping methods. The root mean square errors of the extracted outline corners measure 0.71 m, surpassing the performance of four other comparison methods. In conclusion, the proposed method enables the effective extraction of outlines from regular buildings, providing a valuable contribution to subsequent three-dimensional reconstruction tasks.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837013 (2024)
Microscopic Image Stitching Algorithm Based on Stage Motion Information
Jiaguang Huang, Zhenming Yu, Guojin Peng, Hui Gan, and Lü Meini

Traditional non-real-time image stitching methods can easily lead to global stitching interruption due to local image misalignment. In addition, microscopic images have numerous similar microstructures, causing problems such as long feature detection time and high misalignment rate. To address these issues, a microscopic image prediction stitching algorithm based on carrier stage motion information is proposed. First, the size of the overlapping area between adjacent images is determined by controlling the XY axis movement distance of the electric carrier stage. The accelerated robust feature algorithm is then used to detect feature points in the overlapping area of the image. Second, the range of feature points to be matched is predicted based on the position relationship of the images, and the feature point with the minimum Euclidean distance is selected within the predicted range for matching. Finally, matching point pairs are coarsely screened by the slope of the matching feature points, and precise matching is performed using the random sample consensus algorithm to calculate the homography matrix and complete the image stitching. The improved weighted average algorithm is used to fuse the stitched images. Experimental results show that the proposed algorithm achieves a superior matching rate improvement of 7.95% to 26.52% compared to those obtained via the brute force and fast library for approximate nearest neighbors algorithms, effectively improving the registration accuracy. Moreover, at a resolution of 1600×1200, the multi-image stitching rate of 2 frame·s-1 achieves better results than those obtained by the AutoStitch software.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837012 (2024)
Fast Image-Stitching Algorithm of Dangerous Objects Under Vehicles
Jianjun Zhang, and Xin Jin

To address challenges involving low accuracy in feature point matching, low matching speed, cracks at the stitching points, and extended stitching time in vehicle undercarriage threat detection imaging, an optimized image-stitching algorithm is proposed. First, the corner detection (FAST) algorithm is used to extract image feature points, and then, the binary robust invariant scalable key point (BRISK) algorithm is used to describe the retained feature points. Second, the fast nearest neighbor search (FLANN) algorithm is used for coarse matching. Next, the progressive uniform sampling (PROSAC) algorithm is used for feature point purification. Finally, the Laplace pyramid algorithm is used for image fusion and stitching. The experimental results show that, when compared with SIFT, SURF, and ORB algorithms, the proposed algorithm improves the image feature matching accuracy by 13.10 percentage points, 8.59 percentage points, and 11.27 percentage points, respectively, in the image data of dangerous objects under the vehicle. The matching time is shortened by 76.26%, 85.36%, and 10.27%, respectively. The image-stitching time is shortened by 63.73%, 64.21%, and 20.07%, respectively, and there are no evident cracks at the stitching point. Therefore, the image-stitching algorithm based on the combination of FAST, BRISK, PROSAC, and Laplace pyramid is a high-quality fast image-stitching algorithm.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837011 (2024)
Infrared and Visible Image Fusion Based on Gradient Domain-Guided Filtering and Significance Analysis
Tingbo Si, Fangxiu Jia, Lü Ziqiang, and Zikang Wang

Traditional multi-scale fusion methods cannot highlight target information and often miss details and textures in fusion images. Therefore, an infrared and visible light image fusion method based on gradient domain-guided filtering and saliency detection is proposed. This method utilizes gradient domain-guided filtering to decompose the input image into basic and detail layers and uses a weighted global contrast method to decompose the basic layer into feature and difference layers. In the fusion process, phase consistency combined with weighted local energy, local entropy combined with weighted least squares optimization, and average rules are used to fuse feature layers, difference layers, and detail layers. The experimental results show that the multiple indicators of the proposed fusion method are significantly improved compared to those of other methods, resulting in a superior visual effect of the image. The proposed method is highly effective in highlighting target information, preserving contour details, and improving contrast and clarity.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837010 (2024)
Augmented Edge Graph Convolutional Networks for Semantic Segmentation of 3D Point Clouds
Lujian Zhang, Yuanwei Bi, Yaowen Liu, and Yansen Huang

Currently, most point cloud semantic segmentation methods based on graph convolution overlook the critical aspect of edge construction, resulting in an incomplete representation of the features of local regions. To address this limitation, we propose a novel graph convolutional network AE-GCN that integrates edge enhancement with an attention mechanism. First, we incorporate neighboring point features into the edges rather than solely considering feature differences between the central point and its neighboring points. Second, introducing an attention mechanism ensures a more comprehensive utilization of local information within the point cloud. Finally, we employ a U-Shape segmentation structure to improve the network's semantic point cloud segmentation adaptability. Our experiments on two public datasets, Toronto_3D and S3DIS, demonstrate that AE-GCN outperforms most current methods. Specifically, on the Toronto_3D dataset, AE-GCN achieves a competitive average intersection-to-union ratio of 80.3% and an overall accuracy of 97.1%. Furthermore, on the S3DIS dataset, the model attains an average intersection-to-union ratio of 68.0% and an overall accuracy of 87.2%.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837009 (2024)
Projection Domain Denoising Method for Multi-Energy Computed Tomography via Dual-Stream Transformer
Shunxin Ouyang, Zaifeng Shi, Fanning Kong, Lili Zhang, and Qingjie Cao

The multi-energy computed tomography (CT) technique can resolve the absorption rates of various energy X-ray photons in human tissues, representing a significant advancement in medical imaging. By addressing the challenge of swift degradation in reconstructed image quality, primarily due to non-ideal effects such as quantum noise, a dual-stream Transformer network structure is introduced. This structure utilises the shifted-window multi-head self-attention denoising approach for projection data. The shifted windows Transformer extracts the global features of the projection data, while the locally-enhanced window Transformer focuses on local features. This dual approach capitalizes on the non-local self-similarity of the projection data to maintain its inherent structure, subsequently merged by residual convolution. For model training oversight, a hybrid loss function incorporating non-local total variation is employed, which enhances the network model's sensitivity to the inner details of the projected data. Experimental results demonstrate that our method's processed projection data achieve a peak signal to noise ratio (PSNR) of 37.7301 dB, structure similarity index measurement (SSIM) of 0.9944, and feature similarity index measurement (FSIM) of 0.9961. Relative to leading denoising techniques, the proposed method excels in noise reduction while preserving more inner features, crucial for subsequent accurate diagnostics.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837008 (2024)
Depth Image Super-Resolution Reconstruction Network Based on Dual Feature Fusion Guidance
Haowen Geng, Yu Wang, and Yanling Xin

A depth image super-resolution reconstruction network (DF-Net) based on dual feature fusion guidance is proposed to address the issues of texture transfer and depth loss in color image guided deep image super-resolution reconstruction algorithms. To fully utilize the correlation between depth and intensity features, a dual channel fusion module (DCM) and a dual feature guided reconstruction module (DGM) are used to perform deep recovery and reconstruction in the network model. The multi-scale features of depth and intensity information are extracted using a input pyramid structure: DCM performs feature fusion and enhancement between channels based on a channel attention mechanism for depth and intensity features; DGM provides dual feature guidance for reconstruction by adaptively selecting and fusing depth and intensity features, increasing the guidance effect of depth features, and overcoming the issues of texture transfer and depth loss. The experimental results show that the peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) of the proposed method are superior to those of methods such as RMRF, JBU, and Depth Net. Compared to the other methods, the PSNR value of the 4× super-resolution reconstruction results increased by an average of 6.79 dB, and the RMSE decreased by an average of 0.94, thus achieving good depth image super-resolution reconstruction results.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837007 (2024)
Template Update Mechanism for Single Target Tracking Incorporating Memory Information
Yuwen Mao, Baozhen Ge, Jianing Quan, and Qibo Chen

Single target tracking algorithm based on Siamese architecture suffers from untimely target state update. To address this issue, a generic template update mechanism is proposed based on the dynamic fusion of templates and memory information. The mechanism uses a dual module fusion update strategy. The short-term memory information of search feature map is fused using a memory fusion module to capture target variations. The trusted tracking result of the previous frame is used as a dynamic template. The original and dynamic templates are fused using a weight fusion module from the correlated feature perspective to achieve more accurate target localization using the original and short-term memories during the tracking process. The template update mechanism is applied to three mainstream algorithms, SiamRPN, SiamRPN++ and RBO, and experiments are conducted on the VOT2019 public dataset. The results show that the performance of the algorithms is effectively improved after applying the mechanism. Specially, for the SiamRPN++ algorithm, the average overlap expectation is improved by 6.67%, the accuracy is improved by 0.17%, and the robustness is enhanced by 5.39% after applying the template update mechanism. In addition, the SiamRPN++ algorithm with the mechanism has better tracking performance in complex scenarios with occlusion, deformation and background interference.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837006 (2024)
A Robust Image Segmentation Algorithm Based on Weighted Filtering and Kernel Metric
Yi Liu, Xiaofeng Zhang, Yujuan Sun, Hua Wang, and Caiming Zhang

Image segmentation is an important research direction in computer vision. Fuzzy clustering methods have been widely applied in image segmentation due to their unsupervised nature. However, traditional fuzzy clustering methods often fail to segment images with high-intensity noise and complex shapes. To solve this problem, a weighted factor is proposed based on saliency detection to construct a weighted filter and a pixel correlation model, which improves the noise resistance of the algorithm. The proposed weighted filter outperforms the optimal results of the traditional filter in terms of structural similarity by 0.1. Moreover, a kernel metric is introduced to accommodate the segmentation needs of complex images. Extensive experimental results on synthetic, natural, remote sensing and medical images demonstrate that the proposed algorithm outperforms the traditional methods in visual effects and improves the segmentation accuracy by 2% compared with the optimal results of traditional methods.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837005 (2024)
A Pointer Meter Bilateral Image Segmentation Network Integrating Spatial Details and Semantic Features
Yaohui Zhu, Zhigang Wu, and Min Chen

Aiming at the characteristics of small target image segmentation of pointer meter and the limitations of existing methods, a bilateral deep learning backbone network called BiUnet is proposed for pointer meter image segmentation, which combines spatial details and semantic features. Starting from BiSeNet V2 algorithm, the semantic branch, detail branch and bilateral fusion layer are redesigned in this network. First, the ConvNeXt convolution block is used to adjust and optimize the detail branch to improve the feature extraction ability of the algorithm for pointer and scale line boundary details. Second, the semantic branch is redesigned based on the advantages of the U-shape structure of encoder and decoder to integrate different scales of semantic information, which improves the special segmentation ability of the semantic branch for small objects such as pointer and scale. Finally, a bilateral-guide splicing aggregation layer is proposed to fuse the detail branch and the semantic branch features. The ablation experiments on the self-made instrument image segmentation dataset confirm the validity and feasibility of the proposed network design scheme. Comparative experiments with different backbone networks are carried out on the instrument dataset, the experimental results show that the mIoU (mean intersection of union) of BiUnet's instrument segmentation accuracy reaches 88.66%, which is 8.64 percentage points higher than the BiSeNet V2 network (80.02%). Both of them have better segmentation accuracy than common backbone networks based on Transformer and pure convolution.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837004 (2024)
Underwater Image Enhancement Based on Multi-Stage Collaborative Processing
Hongchun Yuan, Hualong Zhao, and Kai Gao

We propose a multi-stage underwater image enhancement model that can simultaneously fuse spatial details and contextual information. The model is structured in three stages: the first two stages utilize encoder-decoder configurations, and the third entails a parallel attention subnet. This design enables the model to concurrently learn spatial nuances and contextual data. A supervised attention module is incorporated for enhanced feature learning. Furthermore, a cross-stage feature fusion mechanism is designed is used to consolidate the intermediate features from preceding and succeeding subnets. Comparative tests with other underwater enhancement models demonstrate that the proposed model outperforms most extant algorithms in subjective visual quality and objective evaluation metrics. Specifically, on the Test-1 dataset, the proposed model realizes a peak signal-to-noise ratio of 26.2962 dB and structural similarity index of 0.8267.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837003 (2024)
Pulmonary Nodule Computed Tomography Image Classification Method Based on Dual-Path Cross-Fusion Network
Ping Yang, Xin Zhang, Fan Wen, Ji Tian, and Ning He

Pulmonary nodule computed tomography (CT) images have diverse details and interclass similarity. To address this problem, a dual-path cross-fusion network combining the advantages of convolutional neural network (CNN) and Transformer is constructed to classify pulmonary nodules more accurately. First, based on windows multi-head self-attention and shifted windows multi-head self-attention, a global feature block is constructed to capture the morphological features of nodules; then, a local feature block is constructed based on large kernel attention, which is used to extract internal features such as the texture and density of nodules. A feature fusion block is designed to fuse local and global features of the previous stage so that each path can collect more comprehensive discriminative information. Subsequently, Kullback-Leibler (KL) divergence is introduced to increase the distribution difference between features of different scales and optimize network performance. Finally, a decision-level fusion method is used to obtain the classification results. Experiments are conducted on the LIDC-IDRI dataset, and the network achieves a classification accuracy, recall, precision, specificity, and area under curve (AUC) of 94.16%, 93.93%, 93.03%, 92.54%, and 97.02%, respectively. Experimental results show that this method can classify benign and malignant pulmonary nodules effectively.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837002 (2024)
Adaptive Underwater Image Enhancement Algorithm
Ning Yang, Haibing Su, and Tao Zhang

A self-adaptive underwater image enhancement algorithm is proposed to address the issues of color distortion, decreased contrast, and blurring caused by the imaging environment in underwater images. First, based on the local and global color biases in the Lab color space, color compensation is applied to attenuated colors, and thereafter the grayscale world algorithm is used to restore the color balance of underwater images. Second, automatic color scale and gamma correction methods are used to adjust the information of each channel to obtain images with high dynamic range and high illumination. Finally, high-frequency information is obtained through the antisharpening mask method, and image details are enhanced to obtain clear underwater images. The proposed algorithm utilizes statistical information, such as the color deviation and mean square deviation of the image, to achieve adaptive processing. The experimental results show that the proposed algorithm can effectively remove color deviation from underwater images, improve image contrast and clarity, and enhance visual effects. Compared with other algorithms, it has advantages in processing efficiency and time.

Laser & Optoelectronics Progress
Apr. 25, 2024, Vol. 61 Issue 8 0837001 (2024)
Multi-Scale Feature Extraction Method of Hyperspectral Image with Attention Mechanism
Zhangchi Xu, Baofeng Guo, Wenhao Wu, Jingyun You, and Xiaotong Su

In recent years, with the development of deep learning, feature extraction methods based on deep learning have shown promising results in hyperspectral data processing. We propose a multi-scale hyperspectral image feature extraction method with an attention mechanism, including two parts that are respectively used to extract spectral features and spatial features. We use a score fusion strategy to combine these features. In the spectral feature extraction network, the attention mechanism is used to alleviate the vanishing gradient problem caused by spectral high-dimension and multi-scale spectral features are extracted. In the spatial feature extraction network, the attention mechanism helps branch networks extract important information by making the network backbone focus on important parts in the neighborhood. Five spectral feature extraction methods, three spatial feature extraction methods and three spatial-spectral joint feature extraction methods are used to perform comparative experiments on three datasets. The experimental results show that the proposed method can steadily and effectively improve the classification accuracy of hyperspectral images.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437010 (2024)
Tone Mapping Algorithm for High Dynamic Range Images Based on Improved Laplacian Pyramid
Bowen Zhang, Zhenping Xia, Yueyuan Zhang, Cheng Cheng, and Yujie Liu

A tone mapping algorithm for high dynamic range (HDR) images based on the improved Laplacian pyramid is proposed to enhance the rendering effect of HDR images on ordinary displays. The algorithm decomposes the preprocessed image into high-frequency and low-frequency layers, which are then fed into two feature extraction sub-networks. The algorithm combines their output images having different features via a fine-tuning network and finally obtains a low dynamic range image with a superior perceptual effect. Furthermore, the algorithm designs an adaptive group convolution module to enhance the ability of the sub-network to extract local and global features. The test results show that, compared to the existing advanced algorithms, the proposed algorithm can compress the brightness of the HDR image better, retain more image details, and achieve superior objective quality and subjective perception.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437009 (2024)
Underwater Image Enhancement Based on Multi-Scale Attention and Contrast Learning
Yue Wang, Huijie Fan, Shiben Liu, and Yandong Tang

The two common degradations of underwater images are color distortion and blurred detail due to the absorption and dispersion of light by water. We propose an underwater image-enhancement algorithm model based on multi-scale attention and contrast learning to acquire underwater images with bright colors and clear details. The model adopts the encoding-decoding structure as the basic framework. To extract more fine-grained features, a multi-scale channel pixel attention module is designed in the encoder. The module uses three parallel branches to extract features at different levels in the image. In addition, the extracted features by the three branches are fused and introduced to the subsequent encoder and the corresponding decoding layer to improve the ability to extract network features and enhance details. Finally, a contrast-learning training network is introduced to improve the quality of enhanced images. Several experiments prove that the enhanced image by the proposed algorithm has vivid colors and complete detailed information. The average values of the peak signal-to-noise ratio and structural similarity index are up to 25.46 and 0.8946, respectively, and are increased by 4.4% and 2.8%, respectively, compared with the other methods. The average values of the underwater color image quality index and information entropy are 0.5802 and 7.6668, respectively, and are increased by at least 2% compared with the other methods. The number of feature matching points is increased by 24 compared to the original images.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437008 (2024)
Method to Improve Accuracy of PSF parameters of Motion Blur Images using Window Functions
Sanyuan Ju, and Shuhui Gao

Accurate estimation of the point spread function (PSF) is crucial for restoring blurry images caused by motion blur. This paper proposes a method using window functions to improve the accuracy of PSF parameter estimation and eliminate the interference from the central bright line in the spectrogram on the blurred-angle estimation. To achieve this, two-dimensional discrete Fourier transform and logarithmic operation are performed on the motion-blurred image, followed by the calculation of the power spectrogram. Thereafter, the Hanning window function is added to the spectrogram, and the image is processed using median filtering smoothing and binary transformation processing, in combination with morphological algorithm and Canny operator edge detection. Finally, the fuzzy direction is obtained using the Radon transform. Based on the blurred-direction results, the spectrogram of the motion-blurred image is processed by Radon transform in the direction of the blur angle. The distance between the negative peaks is analyzed to obtain the dark fringe spacing, and the blur length is calculated according to the relation between the dark fringe spacing and the blur length. This completes the estimation of the two point spread function parameters. Comparing the proposed algorithm with existing ones, the results show an improvement in the accuracy of parameter estimation, and a reduction in ringing and artifact phenomena generated during restoration. The proposed method makes full use of image information, and is easy to operate.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437007 (2024)
City Wall Multispectral Imaging Disease Detection Method Based on Convolutional Neural Networks
Min Li, Huiqin Wang, Ke Wang, Zhan Wang, and Yuan Li

This paper proposes a nondestructive detection method for detecting wall disease by employing multi-spectral imaging based on convolutional neural networks. This method aims to address issues such as low detection efficiency and easy interference by subjective factors that are associated with the use of artificial survey methods in traditional wall disease detection. The minimum noise separation method is used to preprocess the multispectral imaging data of a city wall, which reduces the dimensions of the data while preserving the original data features and reducing data noise. To address the problem of low classification accuracy caused by mixed and diverse pixels of different types of wall damage, a convolution operation is used to extract the features of wall damage, with the most important features retained and irrelevant features removed, resulting in a sparse network model. The extracted features are integrated and sorted through a full connection layer. Two dropout are included to prevent overfitting. Finally, on a wall multispectral dataset, the trained convolution neural network classification model is used to detect wall damage at the pixel level, and the predicted results are displayed visually. Experimental results show that the overall accuracy and Kappa coefficient are 93.28% and 0.91, respectively, demonstrating the effectiveness of the proposed method, which is crucial for enhancing the detection accuracy of wall disease and fully understanding its distribution.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437006 (2024)
Coal and Gangue Recognition Method Based on Dual-Channel Pseudocolor Image by Lidar
Yan Wang, Jichuan Xing, and Yaozhi Wang

The recognition accuracy and efficiency of coal and gangue have a great impact on coal-production capacity but the existing recognition and separation methods of these minerals still have deficiencies in terms of separation equipment, accuracy, and efficiency. Herein, a coal and gangue recognition method is presented based on two-channel pseudocolor lidar images and deep learning. Firstly, a height threshold is set to remove the interference information from the target ore based on the lidar distance channel information. Concurrently, the original point-cloud data are projected in a reduced dimension to quickly obtain the reflection intensity information and surface texture features of coal gangue. The intensity and distance channels after dimensional reduction are then fused to construct the dual-channel pseudocolor image dataset for coal and gangue. On this basis, the DenseNet-121 is optimized for the pseudocolor dataset, and the DenseNet-40 network is used for model training and testing. The results show that the recognition accuracy of coal gangue is 94.56%, which proves that the two-channel pseudo-color image acquired by lidar has scientific and engineering value in the field of ore recognition.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437005 (2024)
Fast Detection Algorithm for Baggage Pallet Based on Skeleton Model
Qijun Luo, Zheng Li, and Qingji Gao

An integral task in self-service baggage check is the detection of whether pallets are added to the baggage. Pallets loaded with the baggage are mostly obscured; therefore, a fast detection method based on a multi-layer skeleton model registration is proposed to address this issue. A point cloud skeleton model and a point-line model are constructed using a 3D point cloud model to describe the characteristics of the pallet. During online detection, the designed banded feature description is used to capture the border point clouds. Moreover, the proposed point-line potential energy iterative algorithm is utilized to register the point-line model and border points as well as to realize pallet discrimination. An iterative nearest point registration based on random sampling consistency is used to achieve accurate registration and pose calculation as well as to obtain the accurate pose of the pallet. Experimental results show that the algorithm can maintain an accuracy of 94% even when 70% of the pallet point cloud data are missing. In addition, the speed of the proposed algorithm exceeds that of a typical algorithm by more than six times.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437004 (2024)
Visual SLAM Algorithm Based on Weighted Static in Dynamic Environment
Yong Li, Haibo Wu, Wan Li, and Dongze Li

To address the low robustness and positioning accuracy of the traditional visual simultaneous localization and mapping (SLAM) system in a dynamic environment, this study proposed a robust visual SLAM algorithm in an indoor dynamic environment based on the ORB-SLAM2 algorithm framework. First, a semantic segmentation thread uses the improved lightweight semantic segmentation network YOLOv5 to obtain the semantic mask of the dynamic object and selects the ORB feature points through the semantic mask. Simultaneously, the geometric thread detects the motion-state information of the dynamic objects using weighted geometric constraints. Then, an algorithm is proposed to assign weights to semantic static feature points and local bundle adjustment (BA) joint optimization is performed on camera pose and feature point weights, effectively reducing the influence of the dynamic feature points. Finally, experiments are conducted on a TUM dataset and a genuine indoor dynamic environment. Compared with the ORB-SLAM2 algorithm before improvement, the proposed algorithm effectively improves the positioning accuracy of the system on highly dynamic datasets, showing improvements of root mean square error (RMSE) of the absolute and relative trajectory errors by more than 96.10% and 92.06%, respectively.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437003 (2024)
Micro-Video Event Detection Based on Deep Dynamic Semantic Correlation
Peiguang Jing, Xiaoyi Song, and Yuting Su

Nowadays, micro-video event detection exhibits great potential for various applications. As for event detection, previous studies usually ignore the importance of keyframes and mostly focus on the exploration of explicit attributes of events. They neglect the exploration of latent semantic representations and their relationships. Aiming at the above problems, a deep dynamic semantic correlation method is proposed for micro-video event detection. First, the frame importance evaluation module is designed to obtain more distinguishing scores of keyframes, in which the joint structure of variational autoencoder and generative adversarial network can strengthen the importance of information to the greatest extent. Then, the intrinsic correlations between keyframes and the corresponding features are cooperated through a keyframe-guided self-attention mechanism. Finally, the hidden event attribute correlation module based on dynamic graph convolution is designed to learn latent semantics and the corresponding correlation patterns of events. The obtained latent semantic-aware representations are used for final micro-video event detection. Experiments performed on the public datasets and the newly constructed micro-video event detection dataset demonstrate the effectiveness of the proposed method.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437002 (2024)
Infrared and Visible Light Image Fusion Based on Image Enhancement and Secondary Nonsubsampled Contourlet Transform
Qingdian Zhao, and Dehong Yang

To address the problems of excessive loss of detail information, unclear texture, and low contrast during the fusion of infrared and visible images, this study proposes an infrared and visible image fusion method based on image enhancement and secondary nonsubsampled contourlet transform (NSCT) decomposition. First, an image enhancement algorithm based on guided filtering is used to improve the visibility of visible images. Second, the enhanced visible and infrared images are decomposed by NSCT to obtain low- and high-frequency subbands, and different fusion rules are used in different subbands to obtain the NSCT coefficient of the first fusion image. The NSCT coefficients of the primary fused image are reconstructed and decomposed into low- and high-frequency subbands, which are then fused with the low- and high-frequency subbands of the visible light image, respectively to obtain the NSCT coefficients of the secondary fused image. Finally, the NSCT coefficients of the secondary fused image are reconstructed by inverse transformation to obtain the final fused image. Numerous experiments are conducted with public datasets, using eight evaluation indicators to compare the proposed method with eight fusion methods based on multiple scales. Results show that the proposed method can retain more details of the source image, improve the edge contour definition and overall contrast of the fusion results, and has advantages in terms of subjective vision and the use of evaluation indicators.

Laser & Optoelectronics Progress
Feb. 25, 2024, Vol. 61 Issue 4 0437001 (2024)
Algorithm for Multifocus Image Fusion Based on Low-Rank and Sparse Matrix Decomposition and Discrete Cosine Transform
Yanqiong Shi, Changwen Wang, Rongsheng Lu, Zhao Zha, and Guang Zhu

To resolve the problems of scattered focus-edge blurring, artifacts, and block effects during the multifocus image fusion, an algorithm based on low-rank and sparse matrix decomposition (LRSMD) and discrete cosine transform (DCT) is designed to achieve the multifocus image fusion. First, the source images were decomposed into low-rank and sparse matrices using LRSMD. Subsequently, the DCT-based method was designed for detecting the focus regions in the low-rank matrix part and obtaining the initial focus decision map. The decision map was verified using the repeated consistency verification method. Meanwhile, the fusion strategy based on morphological filtering was designed to obtain fusion results of the sparse matrix. Finally, the two parts were fused using the weighted reconstruction method. The experimental results show that the proposed algorithm has the advantages of high clarity and full focus in subjective evaluations. The best results for the four metrics, including edge information retention, peak signal-to-noise ratio, structural similarity, and correlation coefficient in objective evaluations, improved by 62.3%, 6.3%, 2.2%, and 6.3%, respectively, compared with the other five mainstream algorithms. These improvement results prove that the proposed algorithm effectively improves focused information extraction from source images and enhances the focused edge detail information. Furthermore, the algorithm is crucial for reducing the artifact and block effects.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037010 (2024)
Global-Sampling Spatial-Attention Module and its Application in Image Classification and Small Object Detection and Recognition
Jingyu Lu, Haiyang Zhang, Wenxin Wang, and Changming Zhao

The emergence and application of attention mechanisms have addressed some limitations of neural networks concerning the utilization of global information. However, common attention modules face issues with the receptive field being too small to focus on overall information. Moreover, existing global attention modules tend to incur high computational costs. To address these challenges, a lightweight, universal attention module, termed"global-sampling spatial-attention module", is introduced herein based on convolution, pooling, and comparison methods. This module relies on the comparison methods to derive spatial-attention maps for intermediate feature maps generated during deep network inference. Moreover, this module can be directly integrated into convolutional neural networks with minimal costs and can be end-to-end trained with the networks. The introduced module was primarily validated using a randomly selected subset of the ImageNet-1K dataset and a proprietary low-slow-small drone dataset. Experimental results show that compared with other modules, this module exhibits an improvement of approximately 1?3 percentage points in tasks related to image classification and small object detection and recognition. These findings underscore the efficacy of the proposed module and its applicability in small object detection.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037009 (2024)
Infrared and Visible Image Fusion Based on Saliency Adaptive Weight Map
Haiyang Ding, Mingli Dong, Chenhua Liu, Xitian Lu, and Chentong Guo

To solve the problem of insufficient use of source image information by existing fusion methods, a method is proposed using rolling guided filter and anisotropic diffusion to extract the base and detail layers of an image, respectively. These layers were then fused using visual saliency mapping and weight map construction, and a certain weight was added to merge the fused layers into the final image. The proposed method was tested and verified using several scenes from an open dataset. The experimental results show that the final images obtained using the proposed method exhibit better contrast, retain richer texture features at edge details, and maintain a uniform image pixel intensity distribution; furthermore, the visual effects and fusion accuracy of the final images are better than other existing fusion methods. Moreover, significant progress has been made in indicators, such as average gradient, information entropy, and spatial frequency.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037008 (2024)
Low-Light Image Enhancement Algorithm Based on Multiscale Depth Curve Estimation
Hongda Guo, Xiucheng Dong, Yongkang Zheng, Yaling Ju, and Dangcheng Zhang

In this study, a low-light image enhancement algorithm based on multiscale depth curve estimation is proposed to address the poor generalization ability of existing algorithms. Low-light image enhancement is achieved by learning the mapping relationship between normal images and low-light images with different scales. The parameter estimation network comprises three encoders with different scales and a fusion module, facilitating the efficient and direct learning for low-light images. Furthermore, each encoder comprises cascaded convolutional and pooling layers, thereby facilitating the reuse of feature layers and improving computational efficiency. To enhance the constraint on image brightness, a bright channel loss function is proposed. The proposed method is validated against six state-of-the-art algorithms on the LIME, LOL, and DICM datasets. Experimental results show that enhanced images with vibrant colors, moderate brightness, and significant details can be obtained using the proposed method, outperforming other conventional algorithms in subjective visual effects and objective quantitative evaluations.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037007 (2024)
Efficient Global Attention Networks for Image Super-Resolution Reconstruction
Qingqing Wang, Yuelan Xin, Jia Zhao, Jiang Guo, and Haochen Wang

To address the prevalent focus on reducing the parameter counts in current efficient super-resolution reconstruction algorithms, this study introduces an innovative efficient global attention network to solve the issues regarding neglecting hierarchical features and the underutilization of high-dimensional image features. The core concept of the network involves implementing cross-adaptive feature blocks for deep feature extraction at varying image levels to remove the insufficiency in high-frequency detail information of images. To enhance the reconstruction of edge detail information, a nearest-neighbor pixel reconstruction block was constructed by merging spatial correlation with pixel analysis to further promote the reconstruction of edge detail information. Moreover, a multistage dynamic cosine thermal restart training strategy was introduced. This strategy bolsters the stability of the training process and refines network performance through dynamic learning rate adjustments, mitigating model overfitting. Exhaustive experiments demonstrate that when the proposed method is tested against five benchmark datasets, including Set 5, it increases the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) performance metrics by an average of 0.51 dB and 0.0078, respectively, and trims the number of parameters and floating-point operations (FLOPs) by an average of 332×103 and 70×109 compared with leading networks. In conclusion, the proposed method not only reduces complexity but also excels in performance metrics and visualization, thereby attaining remarkable network efficiency.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037006 (2024)
Cardiac Image Segmentation by Combining Frequency Domain Prior and Feature Enhancement
Keyan Chen, Qiaohong Liu, Xiaoxiang Han, Yuanjie Lin, and Weikun Zhang

A segmentation network of heart magnetic resonance image that combines prior knowledge in the frequency domain and feature fusion enhancement is proposed to address the issues of unclear boundaries caused due to the small grayscale differences among the heart substructures in heart magnetic resonance images and the varying shapes and sizes of the right ventricular region, affecting segmentation accuracy. The proposed model is a D-shaped structured network comprising a frequency domain prior guidance and feature fusion enhancer subnetworks. First, the original image is transformed from the spatial domain to the frequency domain using Fourier transform, extracting high-frequency edge features and combining the low-level features of the frequency domain prior-guided subnetwork with the corresponding stages of the feature fusion enhancement subnetwork for improving the edge recognition ability. Second, a feature fusion module with local and global attention mechanisms is introduced at the jump connection of the feature fusion enhancer network to extract contextual information and obtain rich texture details. Finally, the Transformer module is introduced at the bottom of the network to further extract long-distance semantic information, enhance the expression ability of the model, and improve segmentation accuracy. Experimental results on the ACDC dataset reveal that compared to existing methods, the proposed method achieves the best results in objective indicators and visual effects. Good cardiac segmentation results can provide reference for subsequent image analysis and clinical diagnosis.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037005 (2024)
Multi-spectral Pedestrian Detection Based on Deformable Convolution and Multi-Scale Residual Attention
Guoli Zhang, Shuai Chang, Yansong Song, and Tianci Liu

At present, most of the multi-spectral pedestrian detection algorithms focus on the fusion methods of visible light and infrared images, but the number of parameters to fully fuse multi-spectral images is huge, resulting in lower detection speed. To solve this problem, we propose a multi-spectral pedestrian detection algorithm based on YOLOv5s with high timeliness. To ensure the detection speed of the algorithm, we select the merging method of visible light and infrared light channel direction as the input of the network, and improve the detection accuracy by improving the traditional algorithm. First, some standard convolution is replaced by deformable convolution to enhance the ability of the network to extract irregular shape feature objects. Second, the spatial pyramid pooling module in the network is replaced by multi-scale residual attention module, which weakens the interference of the background to the pedestrian target and improves the detection accuracy. Finally, by changing the connection mode and adding the large-scale feature splicing layer, the minimum detection scale of the network is increased, and the detection effect of the network for small targets is improved. Experimental results show that the improved algorithm has obvious advantages in detection speed, and improves the mAP@0.5 and mAP@0.5∶0.95 by 5.1 and 1.9 percentage points over the original algorithm, respectively.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037004 (2024)
Deep Iterative Filter Adaptive Network for Simple Lens Imaging System
Yi Huang, and Tao Xiong

Herein, an end-to-end deep neural network based on iterative adaptive filtering principle is proposed. This network aims to solve the significant image edge blurring caused by the optical structure of simple lenses. A pixel level deblurring filter is proposed, using a single glued lens with a large field of view, to effectively adapt to the spatial changes of blur and restore the blurry features of the input image. The effectiveness of the proposed method is verified through simulation and experiments conducted on a prototype camera system.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037003 (2024)
Image Super-Resolution Reconstruction Algorithm Based on Adaptive Two-Branch Block
Yan Zhang, Minglei Sun, Yemei Sun, and Fujie Xu

Recently, attention mechanisms have been widely applied for image super-resolution reconstruction, substantially improving the reconstruction network's performance. To maximize the effectiveness of the attention mechanisms, this paper proposes an image super-resolution reconstruction algorithm based on an adaptive two-branch block. This adaptive two-branch block designed using the proposed algorithm includes attention and nonattention branches. An adaptive weight layer would dynamically balance the weights of these two branches while eliminating redundant attributes, thereby ensuring an adaptive balance between them. Subsequently, a channel shuffle coordinate attention block was designed to achieve a cross-group feature interaction to focus on the correlation between features across different network layers. Furthermore, a double-layer residual aggregation block was designed to enhance the feature extraction performance of the network and quality of the reconstructed image. Additionally, a double-layer nested residual structure was constructed for extracting deep features within the residual block. Extensive experiments on standard datasets show that the proposed method has a better reconstruction effect.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037002 (2024)
LiDAR Point Object Primitive Obtaining Based on Multiconstraint Graph Segmentation
Zhenyang Hui, Zhuoxuan Li, Penggen Cheng, Zhaochen Cai, and Xianchun Guo

A LiDAR point object primitive obtaining method still encounters challenges, such as large computation amount and ineffective segmentation for different building roof planes. A point object primitive obtaining method based on multiconstraint graph segmentation is proposed to address these challenges. A graph-based segmentation strategy is adopted for this method. First, constraint conditions of adjacent points are used to construct a network graph structure to reduce the complexity of the graph and improve the efficiency of the algorithm. Subsequently, the angle of the normal vectors of adjacent nodes is constrained using a threshold value to divide the point cloud located in the same plane into the same object primitive. Finally, the maximum side length constraint is performed to separate the building point cloud from its adjacent vegetation points. Three sets of public test data provided by the International Society for Photogrammetry and Remote Sensing (ISPRS) and two datasets located in Wuhan University were selected for testing to verify the validity of the proposed method. Experimental results show that the proposed method can effectively divide different roof planes of buildings. DBSCAN and spectral clustering methods were used for comparison, and precision, recall, and F1 score were adopted as evaluation indexes. Compared with the other two methods, the proposed method achieves the best overall segmentation results in case of the five datasets with different building environments, with better recall and F1 score.

Laser & Optoelectronics Progress
May. 25, 2024, Vol. 61 Issue 10 1037001 (2024)
Please enter the answer below before you can view the full text.
9+4=
Submit